Segmentation of Handwritten Documents Containing Kannada Script

نویسندگان

  • Saleem Pasha
  • M. C. Padma
  • Nayana N Shetty
  • Alireza Alaei
  • P. Nagabhushan
چکیده

Segmentation is one of the important phases of Optical Character Recognition (OCR) system, which extracts objects of interest from an image. Feature extraction and classification phases of OCR will be more effective, if the techniques selected for segmentation is effective. This paper focuses on to develop a system for handwritten documents containing Kannada script and proposes suitable techniques to perform preprocessing and also segmentation such as line, word and character segmentation. Novelty is achieved by proposing a modified horizontal projection profile method for line segmentation, in which well separated lines and overlapping lines are detected. An average accuracy of 97.5% is achieved for line segmentation and word segmentation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dataset and Ground Truth for Handwritten Text in Four Different Scripts

In document image analysis (DIA) especially in handwritten document recognition, standard databases play signi ̄cant roles for evaluating performances of algorithms and comparing results obtained by di®erent groups of researchers. The ̄eld of DIA regard to Indo-Persian documents is still at its infancy compared to Latin script-based documents; as such standard datasets are not still available in ...

متن کامل

Implicit segmentation of Kannada characters in offline handwriting recognition using hidden Markov models

We describe a method for classification of handwritten Kannada characters using Hidden Markov Models (HMMs). Kannada script is agglutinative, where simple shapes are concatenated horizontally to form a character. This results in a large number of characters making the task of classification difficult. Character segmentation plays a significant role in reducing the number of classes. Explicit se...

متن کامل

Handwritten Script Identification: Fusion based Approaches

Script identification is one of the preprocessing steps in any document image processing task. Script identification in printed documents has achieved a greater attention whereas script identification in handwritten documents has achieved less attention from document research community. Almost all the existing works have made attempts on identifying suitable features or classifiers for handwrit...

متن کامل

Review: A Literature Survey on Text Segmentation in Handwritten Punjabi Documents

Gurumukhi script is used for Punjabi language, which is a two dimensional composition of symbols with connected and disconnected diacritics. Handwritten Gurumukhi script has some complexities like connected, overlapped text lines, words and characters. It is one of the foremost issues for errors during the recognition process. Text segmentation is a challenging job in unconstrained writer indep...

متن کامل

OCR for Handwritten Kannada Language Script

The optical character recognition (OCR) is the process of converting textual scanned image into a computer editable format. The proposed OCR system is for complex handwritten Kannada characters. One of the major challenges faced by Kannada OCR system is recognition of handwritten text from an image. The input text image is subjected to preprocessing and then converted into binary image. Segment...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016